Smoothed Analysis of Trie Height

نویسندگان

  • Stefan Eckhardt
  • Sven Kosub
  • Johannes Nowak
چکیده

Tries are very simple general purpose data structures for information retrieval. A crucial parameter of a trie is its height. In the worst case the height is unbounded when the trie is built over a set of n strings. Analytical investigations have shown that the average height under many random sources is logarithmic in n. Experimental studies of trie height suggest that this holds for non-random data. In order to give an analytical explanation to these findings we perform a smoothed analysis of trie height. Smoothed analysis combines elements from both, worstcase and average-case analysis: the paradigm assumes that inputs are chosen by an adversary and that the costs are expected costs over slight random pertubations of the chosen inputs. We consider a special class of string perturbation functions which can be modelled by probabilistic finite automata (PFAs). Those perturbation functions constitute an extension of a very natural class of random edit perturbations. We observe that for the case of random deletions the smoothed trie height is unbounded. We introduce read-deterministic star-like perturbation functions, which include random substitutions and insertion as a special case, and give a necessary and sufficient condition for the smoothed trie height under read-deterministic star-like perturbation functions being logarithmic. This condition is particularly appealing since it can easily be checked by looking at the transition probabilities of the corresponding probabilistic automaton. Corresponding author. Email address: [email protected] Email address: [email protected] Email address: [email protected] 2

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of the Height of Tries with Random Weights on the Edges

We analyze the weighted height of random tries built from independent strings of i.i.d. symbols on the finite alphabet {1, . . . , d}. The edges receive random weights whose distribution depends upon the number of strings that visit that edge. Such a model covers the hybrid tries of de la Briandais (1959) and the TST of Bentley and Sedgewick (1997), where the search time for a string can be dec...

متن کامل

On the Analysis of the Average Height of a Digital Trie: Another Approach

The average height of a digital me has been recently investigated in many papers [2]-[8]. In most works on binary digital tries, a Bernoulli model and independent keys are assumed. We relax these assumptions in that V-ary asymmetric tries. Bernoulli and Poisson models. and dependent keys are considered. We show that the average height of the trie is asymptotically equal to 2 19u n (for the Bern...

متن کامل

Smoothed Analysis of Binary Search Trees and Quicksort under Additive Noise

Binary search trees are a fundamental data structure and their height plays a key role in the analysis of divide-and-conquer algorithms like quicksort. Their worst-case height is linear; their average height, whose exact value is one of the best-studied problems in average-case complexity, is logarithmic. We analyze their smoothed height under additive noise: An adversary chooses a sequence of ...

متن کامل

A Study of Trie - like Structures under the Density Model

We consider random tries constructed from sequences of i .i.d . random variables with a common density f on [0,1] (i.e., paths down the tree are carved out by the bits in the binary expansions of the random variables) . The depth of insertion of a node and the height of a node are studied with respect to their limit laws and their weak and strong convergence properties . In addition, laws of th...

متن کامل

Analysis of random LC tries

LC tries were introduced by Andersson and Nilsson in 1993. They are compacted versions of tries or patricia tries in which, from the top down, maximal height complete subtrees are level compressed. Andersson and Nilsson (1993) showed that for i.i.d. uniformly distributed input strings, the expected depth of the LC patricia trie is (log n). In this paper, we reene and extend this result. We anal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007